Search for: All records

Creators/Authors contains: "Rosen, Gail L."

« Prev Next »

Total Resources

16

Resource Type
Conference Paper

1

Conference Proceeding

0

Dataset

0

Journal Article

15

Workshop Report

0

Availability
Full Text / Resource Available

14

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fragment databases from screened ligands for drug discovery (FDSL-DD)

https://doi.org/10.1016/j.jmgm.2023.108669

Wilson, Jerica ; Sokhansanj, Bahrad A. ; Chong, Wei Chuen ; Chandraghatgi, Rohan ; Rosen, Gail L. ; Ji, Hai-Feng ( March 2024 , Journal of Molecular Graphics and Modelling)

Free, publicly-accessible full text available March 1, 2025
Correction: Candidate variants in DNA replication and repair genes in early-onset renal cell carcinoma patients referred for germline testing

https://doi.org/10.1186/s12864-023-09486-z

Demidova, Elena V. ; Serebriiskii, Ilya G. ; Vlasenkova, Ramilia ; Kelow, Simon ; Andrake, Mark D. ; Hartman, Tiffiney R. ; Kent, Tatiana ; Virtucio, James ; Rosen, Gail L. ; Pomerantz, Richard T. ; et al ( December 2023 , BMC Genomics)

Free, publicly-accessible full text available December 1, 2024
Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering

https://doi.org/10.7717/peerj.14779

Nguyen, Rachel ; Sokhansanj, Bahrad A. ; Polikar, Robi ; Rosen, Gail L. ( January 2023 , PeerJ)

A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e. , the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters. The resulting clusters have high homogeneity but completeness that is too low. We propose Complet+, a computationally scalable post-processing method to increase the completeness of clusters without an undue cost in homogeneity. Complet+ proves to effectively merge closely-related clusters of protein that have verified structural relationships in the SCOPe classification scheme, improving the completeness of clustering results at little cost to homogeneity. Applying Complet+ to clusters obtained using MMseqs2’s clusterupdate achieves an increased V-measure of 0.09 and 0.05 at the SCOPe superfamily and family levels, respectively. Complet+ also creates more biologically representative clusters, as shown by a substantial increase in Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI) metrics when comparing predicted clusters to biological classifications. Complet+ similarly improves clustering metrics when applied to other methods, such as CD-HIT and linclust. Finally, we show that Complet+ runtime scales linearly with respect to the number of clusters being post-processed on a COG dataset of over 3 million sequences. Code and supplementary information is available on Github: https://github.com/EESI/Complet-Plus .
more » « less
Full Text Available
Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity

https://doi.org/10.3390/biology11121786

Sokhansanj, Bahrad A. ; Zhao, Zhengqiao ; Rosen, Gail L. ( December 2022 , Biology)

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture’s interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron’s reduced risk of severe disease, in accord with epidemiological and experimental data.
more » « less
Full Text Available
Predicting COVID-19 disease severity from SARS-CoV-2 spike protein sequence by mixed effects machine learning

https://doi.org/10.1016/j.compbiomed.2022.105969

Sokhansanj, Bahrad A. ; Rosen, Gail L. ( October 2022 , Computers in Biology and Medicine)

Full Text Available
Physiological and evolutionary contexts of a new symbiotic species from the nitrogen-recycling gut community of turtle ants

https://doi.org/10.1038/s41396-023-01490-1

Béchade, Benoît ; Cabuslay, Christian S. ; Hu, Yi ; Mendonca, Caroll M. ; Hassanpour, Bahareh ; Lin, Jonathan Y. ; Su, Yangzhou ; Fiers, Valerie J. ; Anandarajan, Dharman ; Lu, Richard ; et al ( August 2023 , The ISME Journal)

Abstract
While genome sequencing has expanded our knowledge of symbiosis, role assignment within multi-species microbiomes remains challenging due to genomic redundancy and the uncertainties of in vivo impacts. We address such questions, here, for a specialized nitrogen (N) recycling microbiome of turtle ants, describing a new genus and species of gut symbiont—Ischyrobacter davidsoniae (Betaproteobacteria: Burkholderiales: Alcaligenaceae)—and its in vivo physiological context. A re-analysis of amplicon sequencing data, with precisely assigned Ischyrobacter reads, revealed a seemingly ubiquitous distribution across the turtle ant genus Cephalotes, suggesting ≥50 million years since domestication. Through new genome sequencing, we also show that divergent I. davidsoniae lineages are conserved in their uricolytic and urea-generating capacities. With phylogenetically refined definitions of Ischyrobacter and separately domesticated Burkholderiales symbionts, our FISH microscopy revealed a distinct niche for I. davidsoniae, with dense populations at the anterior ileum. Being positioned at the site of host N-waste delivery, in vivo metatranscriptomics and metabolomics further implicate I. davidsoniae within a symbiont-autonomous N-recycling pathway. While encoding much of this pathway, I. davidsoniae expressed only a subset of the requisite steps in mature adult workers, including the penultimate step deriving urea from allantoate. The remaining steps were expressed by other specialized gut symbionts. Collectively, this assemblage converts inosine, made from midgut symbionts, into urea and ammonia in the hindgut. With urea supporting host amino acid budgets and cuticle synthesis, and with the ancient nature of other active N-recyclers discovered here, I. davidsoniae emerges as a central player in a conserved and impactful, multipartite symbiosis.

more » « less
How Scalable Are Clade-Specific Marker K-Mer Based Hash Methods for Metagenomic Taxonomic Classification?

https://doi.org/10.3389/frsip.2022.842513

Gray, Melissa ; Zhao, Zhengqiao ; Rosen, Gail L. ( July 2022 , Frontiers in Signal Processing)

Efficiently and accurately identifying which microbes are present in a biological sample is important to medicine and biology. For example, in medicine, microbe identification allows doctors to better diagnose diseases. Two questions are essential to metagenomic analysis (the analysis of a random sampling of DNA in a patient/environment sample): How to accurately identify the microbes in samples and how to efficiently update the taxonomic classifier as new microbe genomes are sequenced and added to the reference database. To investigate how classifiers change as they train on more knowledge, we made sub-databases composed of genomes that existed in past years that served as “snapshots in time” (1999–2020) of the NCBI reference genome database. We evaluated two classification methods, Kraken 2 and CLARK with these snapshots using a real, experimental metagenomic sample from a human gut. This allowed us to measure how much of a real sample could confidently classify using these methods and as the database grows. Despite not knowing the ground truth, we could measure the concordance between methods and between years of the database within each method using a Bray-Curtis distance. In addition, we also recorded the training times of the classifiers for each snapshot. For all data for Kraken 2, we observed that as more genomes were added, more microbes from the sample were classified. CLARK had a similar trend, but in the final year, this trend reversed with the microbial variation and less unique k-mers. Also, both classifiers, while having different ways of training, generally are linear in time - but Kraken 2 has a significantly lower slope in scaling to more data.
more » « less
Full Text Available
Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences

https://doi.org/10.1128/msystems.00035-22

Sokhansanj, Bahrad A. ; Rosen, Gail L. ( April 2022 , mSystems)
Gaglia, Marta M. (Ed.)
ABSTRACT Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces “black box” models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
more » « less
Full Text Available
Predicting Institution Outcomes for Inter Partes Review (IPR) Proceedings at the United States Patent Trial & Appeal Board by Deep Learning of Patent Owner Preliminary Response Briefs

https://doi.org/10.3390/app12073656

Sokhansanj, Bahrad A. ; Rosen, Gail L. ( April 2022 , Applied Sciences)

A key challenge for artificial intelligence in the legal field is to determine from the text of a party’s litigation brief whether, and why, it will succeed or fail. This paper shows a proof-of-concept test case from the United States: predicting outcomes of post-grant inter partes review (IPR) proceedings for invalidating patents. The objectives are to compare decision-tree and deep learning methods, validate interpretability methods, and demonstrate outcome prediction based on party briefs. Specifically, this study compares and validates two distinct approaches: (1) representing documents with term frequency inverse document frequency (TF-IDF), training XGBoost gradient-boosted decision-tree models, and using SHAP for interpretation. (2) Deep learning of document text in context, using convolutional neural networks (CNN) with attention, and comparing LIME and attention visualization for interpretability. The methods are validated on the task of automatically determining case outcomes from unstructured written decision opinions, and then used to predict trial institution or denial based on the patent owner’s preliminary response brief. The results show how interpretable deep learning architecture classifies successful/unsuccessful response briefs on temporally separated training and test sets. More accurate prediction remains challenging, likely due to the fact-specific, technical nature of patent cases and changes in applicable law and jurisprudence over time.
more » « less
Full Text Available
MetaMutationalSigs: comparison of mutational signature refitting results made easy

https://doi.org/10.1093/bioinformatics/btac091

Pandey, Palash ; Arora, Sanjeevani ; Rosen, Gail L. ; Marschall, ed., Tobias ( February 2022 , Bioinformatics)

Abstract Motivation
The analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the non-trivial differences in tools’ refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types.
Results
Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages.
Availability and implementation
MetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.

more » « less

« Prev Next »